fix(cli): exit via os._exit to dodge native SIGABRT at shutdown#321
Merged
Conversation
A pyarrow/lance worker thread (loaded via lancedb in lifecycle commands) can outlive CPython finalization in a one-shot CLI subprocess and trip PyGILState_Release (SIGABRT, exit -6). It's a thread-timing race — flaky — and it intermittently red-blocked unrelated PRs: it killed the erase step of test_cli_lifecycle_round_trip_init_increment_meta_erase on PR #320 (which touches only installer.py), while the same test passed on green master #319. Route the installed `java-codebase-rag` entry through _console_script_main, which flushes stdout/stderr and os._exit(rc) instead of returning into the racy teardown. main() stays return-based so in-process test callers keep working. Co-Authored-By: Claude <noreply@anthropic.com>
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stops an intermittent CI failure where a one-shot
java-codebase-ragsubprocess (e.g.erase) crashes with exit code-6(SIGABRT) during CPython interpreter shutdown:The command's logic completes successfully (
{"success": true}) — the process is killed during finalization.Root cause
_cmd_erase(java_codebase_rag/cli.py) importslancedb(→ pyarrow's native thread pool). When the one-shot CLI returns into normal interpreter shutdown (raise SystemExit(main())), a lingering pyarrow/lance worker thread still holds aPyGILState; finalization tears thread states down out from under it →Py_FatalError→abort()→-6. This is a thread-timing race → flaky.This is a distinct native crash from the kuzu scan SIGSEGV mitigated by #317 — different signal (SIGABRT vs SIGSEGV), different phase (finalization vs scan), different lib (pyarrow/lance vs kuzu).
Why now
It intermittently red-blocks unrelated PRs. It killed the
erasestep oftest_cli_lifecycle_round_trip_init_increment_meta_eraseon #320 (which touches onlyinstaller.py), while the same test passed on green master #319 90 minutes earlier. The crash lives inside thejava-codebase-rag erasesubprocess spawned by_run_cli, so #317's per-file pytest isolation doesn't touch it.Fix
Route the installed
java-codebase-ragentry through a thin wrapper that flushes stdout/stderr and callsos._exit(rc), skipping the racy finalization entirely. One-shot CLI processes have already done all real work and emitted their result before shutdown; finalization buys them nothing and is exactly where the race lives.main()stays return-based so in-process test callers (cli.main([...])) keep working.This is a root-cause fix at the mechanism level, not test suppression: the lifecycle round-trip still runs
erase → init → increment → meta → eraseand asserts exit codes; it just makes the CLI process return its true exit code instead of being murdered by a buggy finalizer.Verification
ruff check .— cleanos._exit(rc)contract with rc 0 and 2; pyproject wiring guard) — RED → GREENpytest tests/test_java_codebase_rag_cli.py— 54 passed, incl. the originally-failingtest_cli_lifecycle_round_trip_init_increment_meta_erasepytest tests— 776 passed, 11 skippedCaveat: the crash was Linux-CI-specific (thread-timing race); local verification (macOS) confirms the fix mechanism (every erase now exits
0viaos._exit) and no regressions. CI will be the final on-Linux confirmation.User-visible changes / reindex / env / ontology
None. No schema, ranking, ontology, env-var, or re-index impact — purely the CLI process's shutdown path.
🤖 Generated with Claude Code